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students' Evaluations of University Instructors: 
The Applicability of American instruments in a Spanish Setting 



ABSTRACT 

Items from two American instruments designed to measure students" 
evaluations of teaching effectiveness were translated into Spanish and 
administered to a Sc\mple of Spanish university students. Most of the ' 
items were judged by the students to be appropriate, every item was 
chosen by at least a few as being a most important item, and all but 
the Workload/Difficulty items clearly differentiated between lecturers 
who students incjicated to be "good", "a^^rage", and "poor". A series 
of factor analyses clearly identif i ed tne factors which the 
instruments v^ere designed to measure and which have been identified in 
previiDus research. Finally, a mul titrai t-multimethod analysis 
demonstrated that there was good agreement between factors from the 
two instruments which were hypothesised to measure the same components 
of effective teaching, and provided support for both the convergent 
and divergent validity of the ratings. The findings illustrate the 
feasibilty of evaluating efft^ctive teaching in a Spanish university 
and the appropriateness of the two American instruments. 
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students' Eval uat ions of University Instructors: 
The Applicability of American instruments in a Spanish Setting 

Students'' evaluations of teaching effectiveness are commonly 
collected at North American universities and colleges, and their use 
is widely endorsed by students, faculty, and administrators (Centra, 
1979; Leventhal, Perry, Abrami , Turcotte, i< Kane, 1981). The purposes 
of these evaluations are variously to provide: 1) diagnostic feedback 
to faculty about the effectiveness of their teaching; 2) a measure of 
teaching effectiveness to be used in tenure/promotion decisions; 3) 
information for students to use in the selection of courses and 
instructors; and 4) an outcome or process-description measure for 
rjesearch on teaching. While the first purpose is nearly universal, 
the next two are not. At many universities systematic student input 
is required before faculty can even be considered for promotion, while 
at others the inclusion of students'* evaluations is optional- 
Likewise, the results of students' evaluations are made public at some 
universities, while at others the results are considered to be 
strictly confidential. The fourth purpose of student ratings, their 
use in research on teaching, has not been systematically examined, and 
this is unfortunate. 

The use of students' evaluations, especially for tenure/promotion 
decisions, has not been without opposition, and in the last decade 
this has been one of the most frequently studied areas in American 
educational research (for reviews see Aleamoni, 1981; Centra, 1979? 
Cohen, 1980, 1981; Costin, Breenough, St Menges^ 1971; de Wolfe, 1974; 
Doyle, 1975; Feldman, 1976a, 1976b, 1977, 1978, 1979, 1982; Kulik ?< 
McKeachie, 19755 Marsh, 1980a, 1982b, in press; Murray, 1980; Overall & 
Marsh, 1982). In contrast to the wide use of students' evaluations in 
North America, they apparently have not been systematically collected 
in universities in other parts of the world, and there has been 
little attempt to test the applicability of instrument instruments 
developed in the United States, or the general i zabi lity of findings 
from American settings in other countries. The purpose of this 
article is to describe two such American instruments, and to report 
upon an investigation of their applicability in a Spanish setting. 
Ib[§ EQdeaygr Instrugjent 

The Endeavor instrument measures seven components of effective 
teaching that have been demonstrated with the use of factor analysis 
O in different settings (Frey, Leonard, & Beatty, 1975) • The seven 
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.factors are Pre<3entati on Clarity, Workload, Personal Attention, Class 
Discussions, Organization-Planning, Grading, and Student 
Accomplishment. In validating the ratings obtained from this 
instrument, Frey has shown that the ratings on Endeavor are correlated 
with student learning <Frey, 1973; 1978; Frey, Leonard, and Beatty, 
1975) « In these studies, as' well as in similar studies described 
below, student ratings are collected in large multisection courses 
(i.e., courses in which the large group of students is divided into 
smaller group^s or sections and all instruction is delivered separately 
to each section). Each section of students in the same course is 
taught throughout by a diffcjrent lecturer, but each is taught 
according to a similar course outline, has similar goals and 
objectives and, most importantly, is tested with the same standardized 
fvnal examination at the end of the course (see Cohen, 1981; Marsh, 
1982b; 1984; Marsh 8t Overall, 1980; for further discussion). Frey 
concluded that those sections of students that rate teaching to be 
most effective are also the sections that learn the most as measaured 
by performance on the final examination, thus supporting the validity 
of ratings on the Endeavor instrument. 

Frey (1978) further argued that it is important to recognize the 
multidimensional ity of evaluations of effective teaching. In an 
examination of the relationships between students' evaluations and a 
variety of other variables he demonstrated that the size, and even the 
direction of the correlations varies with the particular component of 
effective teaching that is considered. The failure to recognize this 
multidimensional ity is an important weakness in much of the American 
research. 

Ihe SEEQ instrument^ 

BEEQ (Students' Evaluations of Educational Effectiveness) and the 
research that led to its development have been recently summarized 
(Marsh, 1982b, 1983, 1984). Numerous factor analyses have identified 
the nine SEEQ factors in responses from different populations of 
students (e.g.. Marsh, 1982b, 1982c, 1983), and also in lecturer self-- 
evaluations of their own teaching effectiveness when they were asked ^ 
to complete the same instrument as their students. (Marsh, 1982c 5 Marsh 
?< Hocevar, 1983). The nine SEEQ factors are Learning/Value,' 
Instructor Enthusiasm, Organization/Clarity, Group Interaction, 
Individual Rapport, Breadth of Coverage, Examinations/Gradinn, 
Assi gnments/Readi ngs, Workload/Di f f i cul ty. 
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Marsh ( 1982c, 1984) , like Prey, argued that students" evaluations, 
like the effective teaching they are designed to reflect, should be 
multidimensional (e.g., a lecturer can be well organized and still 
lack enthusiasm)- He support€?d this common-sense assertion wi th 
empirical results, and also demonstrated that the failure to ( ecognize 
this multidimensionali' y has led to confusion and misinterprecations 
in student-evaluation research • 

The reliability of responses to SEEQ, based upon differences 
among items designed to meas^ure the same factor and differences among 
responses by students in the same course, is consistently hi^h (Marsh, 
1982b). In order to test the Jong-term stability of responses to SEEQ 
students from 100 classes w€?re asked to reevaluate teaching 
effectiveness several years after their graduation from the r 
university program, and their re:trospecti ve evaluations corr^elated 
0.83 with thqse the same students had given at the end of each class 
(Overall Mar sh, 1980^. Ratings on SEEQ have successful 1 y been 
validated against the ratings of former students (Marsh, 19^7), 
student learning as measured by objecti ve examination in mu.* ti section 
courses (Marsh 8t OveraU 1980; . Marsh , Fleiner, & Thomas, 1«*75), 
lecturer self -evaluations of their own teaching effectiveness (Marsh, 
□vwrall, & Kesler, 1979; Marsh, 1982c), and affective course 
consequences such as the application of course materials and plans to 
pursue the subject further (Mareh & Overall, 1980). None of a set of 
16 "potential biases" (e.g., class size, expected grade, prior subject 
interest) could account for more than 5 per cent of the variance in 
SEEQ ratings (Marsh, 1980b; 1983), and many of the relationships were 
inconsistent with a simple bias explanation (e.g, harder, more 
difficult courses were evaluated more favorably). SEEQ ratings are 
primarily a function characteristics of the person who teaches a 
course, rather than of the particular course which he or she teaches 
(Marsh, 19Slb, 1982a; Marsh ?< Overall, 1981). Finally, feedback from 
SEEQ, particularly when coupled with a candid discussion with an 
external consultant, led to improved ratings and better student 
learning (Overall & Marsh, 1979). 
lb® £c?1?d£ Studyji. 

The purposes of the present study are to test the applicability 

of the SEEQ and Endeavor instruments in a Spanish setting, and to 

replicate the results of a similar study conducted in an Australian 

setting where the factors which these surveys are designed to measure 

Q were empirically demonstrated and judged by Australian students to be 
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♦appropriate and important (Marsh, 1981a). In the present study, items 
from both the SEED and Endeavor instruments were translated into 
Spanish and administered to a sample o-f Spanish university students- 
Students were asked to select a representative "good", "average"*, and 
"poor" lecturer, to evaluate each with the same set of items, to 
indicate inappropriate items, and to select the most important items- 
These criteria, in addition to factor analyses of the ratings, were 
used to test the applicability of these American instruments in a 
Spanish setting. 

METHOD 

Samgie and Procedures-^ 

The evaluation instrument was administered to a total of 209 
students who were currently enrolled in the Universidad De f^aivsirrsi. 
The subjects were second, third and fourth-year university students, 
primarily between 19 and 21 years of age? who were in the process of 
completing degrees in education, architecture, or law- Students, who 
volunteered to participate, were read i nstruct i onis about the study, 
after which they completed the instrument- Students were not asked to 
put their name on the i nstrument , and the confidentiality of their 
responses was guaranteed- There was no time limit for completing the 
iristrument> but most students had completed it within about 70 
minutes. All instruments were administered by the second author of the 
study- 
Each evaluation instrument contained a cover page with 
instructions and demographic items, and requested that students select 
a "good", an "average" and a "poor" lecturer from their university 
experience- They were asked to try to limit their choices to 
lecturers who were in charge of an instructional sequence which lasted 
at least one term, and who taught courses that employed a 
lecture/discussion format- Students were then asked to fill out three 
separate questionnaires, one each for the good, average, and poor 
lecturers- The items, in paraphrased form, and the components of 
effective teaching which they are hypothesized to measure appear in 
Table 1. Students responded to each item on a nine-point response 
scale which Vt^ried from ("1-very poor, very low, or almost never" to 
"9-very good, very high, or almost always"- An additional "not 
appropriate" response was provided for items that were not relevant to 
the course being evaluated (responses to items left blank were also 
counted as "not appropriate")- After completing the ratings for a 
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given lecturer^i students 'were asked to select up to five questions 
that they felt were "most important in describing either positive or 
negative aspects of the overall learning experience in this 
instructional sequence". 

« 

Statistical Anal:iisi5j^ 

Each item was initially tested in terms of: (a) its ability to 
di scrii.iinate among the good, average, and poor instructors; (b) its 
appropriateness (i.e., the lack of "not appropriate" responses); and 
(c) its importance (i.e., the number of "most important" nominations). 
Items were categorized as representing ten dimensions on an a priori 
basis (support for these dimensions was found in the Australian study 
described by Marsh, 1981a) and a factor analysis of responses to all 
items was used to test the ability of the responses to differentiate 
among these hypothesized components of teaching effectiveness. 
Separate factor analyses were also performed on responses to items 
from the SEEQ and the Endeavor instruments, and factor scores derived 
from these analyses were used to deter<)iine the relationship between 
SEEQ and Endeavor f actors- 
All the statistical analyses were conducted with the 
commercially available SPSS statistical package (Hull ?< Nie, 1981). A 
separate one-way analysis of variance (ANWVA) was used to test the 
ability of each item to discriminate between "good", "average", and 
"poor" teachers, and differences between i|he three groups were then 
broken into linear and nonlinear components (Nie, et al . , 1976, p- 
425). The factor analyses were performed Nith iterated communality 
estimates, a Kaiser normalization, and an oblique rotation, also using 
the SPSS procedure- 

For purposes of this study, blank and "not appropriate" responses 
were considered to be missing values. Each of the factor analyses was 
performed on correlation matrices constructed with "pair — wise 
deletion" for missing data. Factor scores derived from these analyses 
were used to represent the SEEQ and Endeavor factors, and consisted of 
weighted averages of responses to each item. Factor scores, based 
upon weighted averages of nonmissing values, were computed for each 
student so long as at least 757. of the responses were completed (for 
further discussion of how factor scores were derived and how missing 
data was handled, see Nie, et al . , 1976, p. 496). 
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Preliminary inspection o-f the content of the SEEQ and Endeavor 
instruments revealed consi derabile overlap in the dimensions defined by 
• each. Five SEEQ factors (Learning/Value, Group Interaction, 
Individual Rapport, Examinations/Grading, and Workload/Difficulty) 
appear to correspond closely to five Endeavor factors (Student 
Accomplishments, Class Discussion, Personal Attention, Grading, and 
Workload). A sixth SEEQ factor, Organization/Clarity, sf?Bms to have 
been divided into two factors for the Endeavor instrument 
(Presentation Clarity and Orgctnization/Planning) . Three SEEQ factors, 
Instructor Enthusiasm, Breadth of Cover age^ and Assignments/Readings, 
do not appear. to correspond to any of the Endeavor factors. On the 
basis of this preliminary inspection and results of the Australian 
study, 32 of the 34 SEEQ items (Ml - M32) , the 21 End eavor items (Fl — 
F21) , and seven additional items (Al - A7) were each classified into 
one of ten dimensions (see Table 1). Two other &EEQ, not specifically 
designed to measure a particular factor, are overall ratings of the 
instructor (M31) and the course (M30) . 

Insert Table 1 About Here 

With the exception of Workload/Difficulty items, all items 
significantly (p < .001; see Table 1) differentiate among the "good", 
"average", and "poor" instructors in the predicted direction (i.e., 
"average" instructors were, evaluated significantly lower than "good" 
instructors and significantly higher than "poor" instructors). 
Furthermore, nearly all of the differences among the three groups is 
explained by the linear component (i.e., the "variance explained" by 
the linear component is generally 30 to 60 times as large as the 
remaining variance which is explained by a rionlinear component). The 
differences are particularly large for the Instructor Enthusiasm, 
Presentation Clarity, and Learning/Value/Accomplishment dimensions. 
Workload/Difficulty items do not differentiate among the three groups 
as clearly. Nevertheless, "good" instructors tend to teach courses 
which are judged to be more difficult and require more work. 

Students were specifically asked to indicate items that were 
inappropriate. Nine of the 62 items were judged to be inappropriate 
by more than 107. of the students (see Table 1). These included all 
six of the Assignments/Reading itemsi, and items related to feedback 
from examinations, ability to get individual attention, and discussion 
of current developments. The number of inappropriate responses to the 
Assignments/Reading items suggests that outside assignments are not 
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necessarily a part of courses in this Spanish University. 
Nevertheless, a majority of the items were judged to be appropriate by 
957. or more of the students, and indicate that most of that the items 
are generally appropriate in this Spanish setting. 

Students selected as many as five items that they felt were most 
important in describing positive, or negative aspects of the overall 
learning experience. Each of the 62 items, even those seen as 
inappropriate by 107. or more of the students, received at least 8 
nominations, and at least one item from each of the ten categories 
received 32 or more nominations (see Table 1). Four items received 
over 100 nominations: course challengfing ?< stimulating (Ml), lecturer 
enthusiastic about teaching (M5) , teaching style held your interest 
(M8), and lecturer explanations were clear (M9) . Items in the 
Learning/Value/Accompl ishments, Instructor Enthusiasm, and 
Presentation Clarity categories were nominated most frequently. While 
some of the items and some of the dimensions were seen as more 
important, the nominations were spread widely over the entire set of 
items. This suggests that each of the dimensions measures a 
potentially important component of effective teaching. 
E^Et9!Z Analyses of The Combined Set of Items^ 

Based upon an a priori examination of the content of each item 
and the results of the Australian study (Marsh, 1981a), it was 
hypothesized that the 62 items would measure 10 components of teaching 
effectiveness. This hypothesis was empirically tested through the 
application of factor analysis. The results (see Table 2) demonstrate 
that each of the 10 factors is identified with remarkable clarity. 
With the exception of two items (F3 ?< F13) , each item loads 
substantially on the factor it was designed to measure (target 
loadings) and less substantially on the other nine factors (nontarget 
loadings), A majority of the target loadings are greater than 0.55, 
and only three are less than 0.30. A majority of the nontarget > 
loadings are less than 0.1, 957. are less than 0.2, and less than 17. 4 
are greater than 0.3. i 

Insert Table 2 About Here J 

y-- 

The overall ratings of the instructor and course are not 

specifically designed to measure a particular dimension, but results 

from North American studies indicate that they load most highly on the 

Instructor Enthusiasm and Learni ng/VaUie dimensions respectively 

(Marsh, 1983). However, in the Australian study, the Overall 

O Instructor Rating loaded most highly on the Presentation Clarity 

. .10 
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dimension, though the Overall Course>ating still loaded most 
substahtially on the Lerftrning/Value dimension. In the Spenish 
•setting, bath the Overall Course and Overall Instructor ratings load 
mosit highly on the Clarity dimension, and to ,3 lesser ext-snt on the 
Instructor Enthusiasm dimension. 

BQ^iyses of ResEQDses SEEQ and Endeavor Infstr uments^ 

Two separate factor analyses, analysis of responses to the 34 
SEEQ items (see Table 3) and to the 21 Endeavor items (soe Table 4), 
each clearly identify the factors which thos? instruments were 
designed to measure. For both analyses every target loading is at 
least 0.3, and a majority are greater than 0.5. Few nontarget 
loadings in either analysis are as large as 0.3, and most arp less 
than 0.1. Factor scores ^used in the analysis, described below were 
based upon these factor analyses. 

Insert Tables 3 4 About Here 
Correlations between the nine SEEQ and seven Endeavor factors 
<see results of Spanish study in Table 5) are presented in a form 
somewhat analogous to a multitrait-multimethod (MTMM) matrix, where 
the dimensions of teaching effectiveness are the multiple traits and 
the different instruments correspond to the multiple methods. 
Convergent validity refers to the correlations between SEEQ and 
Endeavor dimensions that are hypothesized to measure the same 
construct, while d:: scr iminant validity refers to the distinctiveness 
of the different dimensions and provides a test of the 

mul tidimensional ity of the ratings. Typical MTMM analyses (see Marsh 
Hocevar, 1983) would require that the same dimensions be assessed b 
the two instruments, but with minor modifications, the criteria 
developed by Campbell and Fiske (1959) can be applied to test for 
convergent and divergent validity in this data. 

1. Convergent validities, correlations between SEEQ and Lndeavor 
factors that are hypothesised to match (correlations in boxes in Tabl 
u) . should be substantial. Here the convergent validities vary 
between 0.71 and 0.93, and clearly satisfy this criterion. 

2. One criterion of discriminant validity is that correlations 
between these matching factors should be higher than the correlations 
Oetween nonmatching SEEQ and Endeavor factors in the same row or 
column of the rectanaular submatrix- The application of this 
criterion requires that each of the seven convergent validities be 
compared with 14 other correlations. This test is met for 97 of the 
98 comparisons, and clearly satisfies the second criterion. 

3. Another criterion of discriminant validity is that 
correlations between these matching factors should be higher than 
correlations in the same row or column of the the triangular 
submatrices. The application of this criterion requires that each 
convergent validity is compared with eight correlations involving 
other SEEQ factors and six correlations involving other Endeavor 

<^ -Factors. This test is met for all 98 of these comparisons, and 
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ilearly satisfif^s the third criterioru 

, - • \, 

4. The pattern of cjjrrel ati ons among SEEQ factors ^;hoL(ld be? 
sini 1 ar to the pattern 6f correlations among Endeavor, factors (e.g. 
as the two SEEQ factors of Group Interaction and Indi vi dual i^Rappar L 
are highly correlated, then so sshould be the two Endeavor factorsNaf, 
Class DxscussicTS and .Personal Attention). A visual inspectfon of^5^e 
correlations in lable 5 demonstrates the similarity in the patterns o^f 
carrelations. • ' 

Insert Table 5 About Here ' i . 

For purposes of comparison, the cor respgndi ng correlations from 
. ttie Australian study also appear in Table 5. Results described above 
for the present study are similar to those in the Australian data i/jith 
one major exception; in the Australian study the carrel at i on between 
the GEI5Q Gr*ading/Examinatian factor and the Endeavor Exam factor is 
not nearly so high as in the Spanish data. This exception is 
primarily due to the poor def i ni t i on of the the SEEQ 

GK-ading/Examinations' factor in the Australian study. ' NeverthelesSj, 
with this exception, there is a str i ki ng si mi i ar i ty between the 
results of the tVrU studies. 

Discussion and lQ]Eii.cati gns 
Items from two American instruments designed tq measure students' 
evaluations of teaching effectiveness were translated into Spanish, 
and administered to a sample of Spanish uni versi ty students. Most of 
the items were' judged to* be appropr i ate by the students, every item 
was chosen by at least a few as being most important, and all but the 
Workload/Difficulty items clearly differentiated between lecturers who 
students indicated to be "good*% "average", and "poor**. A series of 

^ factor analyses clearly identified the factors which the instruments 
were designed to measure and which have been identified in previous 
research. Finally, factors on the SEEQ and Endeavor instruments which 
v*^'-- f f .'V ^''.j^c.'i ^rrt^-V^-^i: H-^. ^f^t^mMY^ *i-*^il«ir dimensions of effective teaching 
were found to be substant i al 1 y correlated, while correlations between 
nonmatching factors were substantially smaller. 
4 An important aspect of the present study was to determine if 

components of effective teaching identified in responses by American 
university students could also be identified in responses by Spanish 
students. The identification of distinct components suggests that 
students are differentiating among various components of teaching 
effectiveness and not just judging lecturers on a general good bad 
dimension- Furthermore, earlier disqiussion proposes ttiat students^ 
evaluations cannot be adequately understood if this 

O m«U ti di mensi onal i ty is ignored- The demonstration of a clearly 
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defini^d faL:tor structure which corresponds to that found in the 
Australian study as well as in American settings, argues that Spanish 
students do differentiate among different components and that the 
specific components have a remarkable generality across quite 
different nationalities. Similarly, the MTMM analysis of responses to 
the SEEQ and Endeavor instruments shows tl^at students differentiate 
among dimensions of effective teaching in a similar manner with both 
instruments- 

Despite the strong evidence for the separation of the various 
dimensions of effective teaching, there still existed substantial 
correlations among some of the factors in both the Australian and 
Spanish studies- For the SEEQ factors, correlations among the 
Learning/Value, Instructor Enthusiasm, and Organization/Clarity 
factors were all high, as was the correlation between Group 
Interaction and Organization/Clarity factors- Among the Endeavor 
factors, Organization/Planning and Clarity were highly correlated, 
while correlations between Personal Attention and Group Discussion, 
Organization/Planning and Student Accomplishments, and Clarity and 
Student Accomplishments were also high. However, several points are 
relevant to interpreting these high correlations- First, these 
correlations were substantially lower than the reliabilities of the 
factors and even lower than the convergent validities observed in the 
MTMM analysis. Second, these correlations are based upon responses by 
individual students where halo/method effects are likely to have a 
relatively large impact. Students' evaluations are typically 
summarized by the average response by all the students in a given 
course and halo effects specific to particular students are likely to 
cancel out- Third, by specifically asking students to select "good", 
"average", and "poor" teachers, the ratings are likely to be 
stereotypic and biased against differentiation among dimensions (e.g., 
there would be a tendency to rate "bad" lecturers as poor on all 
items). Finally, some of the differentiation among components may be 
lost when students are asked to make retrospective ratings of former 
lecturers rather than to evaluate current lectureres. 

These findings clearly demonstrate that teaching effectiveness 
can be measured in a Spanish setting, that evaluation instruments 
developed at American universities are appropriate in a Spanish 
setting, and that the same components that underlie evaluations of 
teaching ef ectiveness at American universities also apply in Spanish 
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settings* These same conclusions also resulted from the similar study 
which was conducted at an Australian university- Taken together, 
these two studies suggest the possibility that students' evaluations 
of teaching effectiveness and components such as those contained in 
the SEEQ and Endeavor instruments may be applicable to any university 



An important and provocative question raised by these findings is 
why students' evaluations are so widely employed at North American 
universities, but not at universities in other countries? The 
conclusions of this article and the Australian study suggest that 
teaching effectiveness can be measured by students' evaluations in 
different countries and that perhaps other findings from research 
conducted in North America may generalize as wel 1 ^ so this is not the 
reasonn A more likely explanation is the political climate in 
American universities. While the study of students' evaluations has a 
long history in the United States, it was only in the late 1960'5 and 
1970's that they became widely used. During this period there was a 
marked increase in student involvement in university policy making and 
also an inc. eased emphasis on "accountability" in universities. 

While the impetus for the increased use of students' evaluations 
o^ teaching effectiveness in North American universities may have been 
the political climate, subsequent research has shown them to be 
reliable, valid, relatively free from bias, and useful to students, 
lecturers, and administrators. Future research in the use of 
students' evaluations in different countries needs to take three 
directions. First, in order to test the generality of the conclusions 
in this article, the study described here should be replicated in 
other countries. Second, the validity of the students' evaluations 
must be tested against a wide variety of indicators of effective 
teaching in different countries as has been done in American research 
described earlier. Third, perhaps employing the instruments used in 
this study, there is a need to examine and document the problems 
inherent in the actual implementation of broad, institutionally-based 
programs of students' evaluations of teaching effectiveness in 
different countries. 
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Footnotes 

1. 

, This study- though similar to the Australian study , differs in 
several ways. First, Australian students were asked to select a 
"best" and a "worst" lecturer, while Spanish students selected a 
"good", "average", and "poor" lecturer. Second, Australian students 
made their responses on a five-point response scale rather than on a 

nine-point re=;ponse scale. Third, though all the SEEQ and Endeavor 

it 

items were used in each ^tudy, items "01", "06", and "07" were used 
only in the Spanish study. These changes were made in order to better 
define the various factors and are recommended for additional 
replications of the study in different settings. Researchers who are 
interested in replicating this study are encouraged to contact the 
first author. 
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TABLE 1 

Hypothesized Factors, Individual Items and Their Characteristics 
Discrimination Among Lecturers 

Riin~Risp5nii5~F5r---9irrance-Exprirn53-By: Nof AEfc^J- VioT'' °^ 
Lecturers Chosen «s: Linear Nonlinear nriateA Important 

Bood Average Poor Component Component Responsdt Nominations 



Ml 

M2 

M3 

M4 

F19 

F20 

F21 



7.5 
7.5 
7.2 
7.5 
7.4 
7.4 
7.5 



i 



5.7 
1 
6 
6.4 
6.2 
5.9 
6.2 



2.3 
3.6 
2.7 
4.5 
3.7 
3.4 
3.7 



61.4 
39.3 
49.3 
33.1 
35.5 
46.7 
43.7 



1.7 
1.1 
1.2 
0.9 
0.1 
1.0 
1.4 




110 

89 
70 
23 
19 
51 
53 



M5 
M6 
M7 
MB 
Al 



8.1 
7.4 
7.2 
7.9 
8.2 



6.6 
5.7 
5.2 
5.4 
6.4 



4.2 
2.7 
3.0 
2.0 
3.9 



43.7 
53.6 
39.7 
68.7 
46.4 



8.9 
0.3 
0.0 
0.4 
0.5 



5 
17 
22 
15 

9 



163 
43 
72 

121 
82 



M9 

MIO 

M12 

Fl 

F2 

F3 



7.9 
7.9 
7.4 
7.6 
7.8 
7.7 



6.3 
6.1 
5.7 
6.1 
5.9 
6.4 



2.6 
2.4 
2.4 
2.7 
2.6 
3.8 



62.9 
67.2 
52.7 
61.0 
64.8 
45.4 



3.6 
2.7 
0.2 
2.6 
1.5 
1.2 



4 

12 
21 
43 
10 
12 



184 
92 
64 
50 
86 
58 



mi 

F13 

F14 

F15 

A2 

A3 

m3 

M14 
M15 
M16 
FIO 
Fll 
F12 

M17 

M18 

M19 

M20 

F7 

F8 

F9 



7.6 
8.2 
7.0 
7.2 
7. 1 
7.4 

6.3 
6.6 
7.0 
6.6 
8.0 
6.9 
6.5 

7.4 
7.1 
6.4 
6.6 
7.2 
7.7 
6.8 



6.2 
6.8 
6.0 
5.8 
5.7 
5.9 

5.8 
5.4 
5.5 
5.2 
6.5 
5.6 
5.2 

6.4 
5.4 
4.8 
5.6 
5.7 
6.0 
9.0 



3.9 
4.1 
3.9 
3.6 
3.9 
3.4 

3.6 
3.3 
3.1 
3.4 
4.5 
3.5 
3.2 

4.1 
3.0 
2.6 
4.1 
3.6 
4.2 
3.2 



37.8 
45. 1 
25.9 
39.0 
39-0 
43.9 

20.3 
30.8 
42.4 
29.0 
38.7 
30.1 
32.3 

31.0 
41. 1 
38.6 
16.2 
33. 1 
34.9 
35.0 



0.5 
1.5 
1.3 
0.6 
0.9 
Q.9 

2.5 
0.6 
t).7 
0.1 
0.4 
0.6 
0.5 

1.5 
0.5 
0.3 
0.3 
0.3 
0.0 
0^0 



31 
12 

^ 

21 
17 

20 
47 
12 
31 
6 
21 
34 

7 
49 
36 

97 

22 



42 
87 
18 
30 
47 
44 

48 

28 
19 
8 
34 
35 
20 

86 
36 
52 
31 
42 
36 
64 



M21 
M22 
M23 
M24 



7.0 
7.0 
6.8 
6.9 



6.0 
6.0 
5.7 
5.9 



4.0 
3.8 
3.3 
3.9 



28.9 
31.8 
35.0 
29.7 



2.0 
1.4 
1.3 



45 

33 
60 
115 



41 

IS 
28 
22 



M25 
M26 
M27 
F16 
F17 

Fia 



6.5 
6.9 
6.7 
7.0 
6.5 
6.7 



5.2 
5.8 
5.6 
5.8 
5.5 
5.5 



2.8 
3.6 
3.5 
3.7 
3.4 
3.4 



36.3 
31.9 
30. 1 
31.6 
29.0 
33.3 



1.1 
1.1 
1.0 
0.7 
1.5 
1.1 



89 
38 
57 
54 
54 
54 



28 

76 
28 
87 
39 
20 



M2S 

M29 

A4 

A5 

A6 

A7 

M32 

M'4 
F4 
F5 
F6 



M31 

M30 



6.S 
6.8 
6.4 
7.3 
6.9 
7.4 

6.0 
3.9 
A.O 
7.4 
6.9 
6.1 



8.2 
8. 1 



5.e 

5.7 
5.4 
5.8 
5.7 
6. 1 

5.4 
5.S 

D. w 

6.3 
6.1 
S.8 



S.9 
5.9 



4.3 
3.6 
3.3 
4.5 
4.1 
4.3 

5.4 

4.9 
5.6 
S.6 
5.3 



2.5 
2.3 



19.5 
33.9 
27.0 
20.0 
27.7 
30.2 

1.4 
O.S 
0.2 
11.0 
6.9 
1.4 



81.3 
79.6 



0.4 
1.2 
1. 1 
0.0 
0.2 
0.2 

0.8 
0.2 
0.2 
0.2 
0.2 
0.0 



1.0 
1.8 



132 
99 
98 
121 
134 
125 



1 

2 
5 
7 



12 

13 



16 
14 
32 
17 
11 
12 

19 
19 
21 
33 
19 
33 



49 

28 
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NOTE: The Endedvor -factors of Pr^«enf Af i ri*»-4*. ^ ^* 
recresented by a ttlnalo fArto*- ni«^ Clarity and PUnning/Qb jecti ves are 

19 



. . F.ctar An.ly«mi of R»ponm» to All itUs <N=627 s«t8 of ratings) 
™r^M^9fimnFWiul.tlng 



L«arn«d soiiNith?ng v|ilu«bl« 
Incrvass subivct intervst 
L««rn«d tc undarstood subjvct matter 
Ft9 Undfrstood tha advanced material 
F20 Ability to analyze issues 
F21 Increased knowledge Sc competence 

i;P«M.fS?P^SiSut t..chino 

M Dynamic and energetic 

H7 Enhanced presentation nith humor 

N8 Teaching style held your interest 

Al Seems to enjoy teaching 

PRiSENTdTIQiil CLeeilY IQRiAtllZAIiQNI 

n9 [lecturer explanations clear 

MIO Materials well Explained ti prepared 

Ml 2 Lectures facilitated taking notes 

Fl Presentations clarified materials 

F2 Presented clearly 9c summarized 

F3 Hade good use of examples 

PLANNIbjSZgeJgCIIV^Ip IQRSAi^lI ZAIiOtll 

HIT Course objectives stated C pursued 

F13 Presentations planned in advance 

F14 Provided detailed course schedule 

F15 Activities orderly scheduled 

A2 Time distributed over topics 

A3 Announced goals (k/or criteria 

GROUP INTiRAQTION/giigySSlOtl 
nT3 Encouraged class discussion 

Students shared knowledge/ideas 



' (1) (2) (3) (4) (5) (6) (7) (B) <9> <10 

34 27 25 -03 06 03 14 10 OB 07 

63 04 09 01 07 07 12 00 05 02 

47 22 06 06 05 09 15 03 OB -01 

70 01 02 15 00 04 -04 10 OS -08 

53 09 15 17 04 05 -04 03 07 -13 

58 10 03 07 05 00 15 05 04 10 

66 03 04 01 -01 14 12 10 07 10 



H14 

N15 
Mi6 
FIO 
Fll 
F12 

INDIV 



Encouraged questions h gave answers. 
Encouraged expression ot ideas 
Class discussion was welcome 
Students encouraged to participate 
Encouraged students to express ideas 



HI7 Friendly iowarSs inarvidual studi 



M18 
Mi9 
H20 
F7 
F8 
F9 



_ ents 

Welcomed students seeking help/advice 
Interested in individual students . 
Accessible to individual students ^ 
Listened tc was willing to help 
Able to get personal attention 
Concerned about student difficulties 

i 



BBEADTa OF CpVERfiGg 
n2T Contrasted various implications 
H22 Gave background of ideas/concepts 
N23 Gave different points of view 
H24 Discussed cur runt developments 



i 



AMiNdiigbis 

nation f eec ^ . . 

26 Evaluation methods fair/appropriate 

27 Tested course content as emphasized 
16 Grading was fair and impartial 

Grading reflected student performance 
Grading indicative of accomplishments 



05 56 08 -01 07 13 

14 31 20 06 07 02 

07 49 03 04 IB 09 

22 45 22 06 07 -01 

07 61 00 05 08 15 



20 -04 05 01 

13 12 16 14 

08 05 03 07 
00 10 



19 01 



15 05 
04 04 



16 



13 22 
02 



24 50 
50 



19 02 -02 04 02 06 -01 



25 35 
19 46 
29 41 



15 
09 

07 37 11 



04 08 04 
12 09 -09 



21 -02 00 11 

?7 04 05 01 

5 -01 04 13 07 05 03 

21 00 00 06 U 11 -02 

02 07 02 19 20 07 04 



14 -07 22 
06 16 29 

-02 -11 08 58 16 

08 -04 25 44 -01 

14 15 -08 57 " 

08 25 -09 50 



58 02 -02 09 08 04 01 
13 -11 12 30 09 -02 09 
02 09 10 03 

05 18 

06 -05 
04 08 



03 
12 



14 01 
10 



14 

07 



10 



16 01 
13 -01 



07 06 02 01 70 06 04 02 05 00 
03 06 05 02 66 0? 15 07 08 -01 



01 11 11 

00 08 -01 

04 04 21 03 

06 08 -03 10 

03 07 -02 



16 51 12 15 07 08 00 

09 69 16 09 05 06 -02 

44 23 15 08 03 00 

73 09 06 03 08 03 

15 66 14 13 06 02 -01 



00 06 IB -06 29 39 09 12 

02 02 24 04 21 50 -01 17 

03 14 07 06 17 
11 01 -04 07 -01 



52 03 15 
65 08 -04 



01 -03 23 -02 28 52 05 I9 
06 05 
-02 10 



12 -01 09 60 09 11 
08 07 26 51-01 13 



08 -06 

09 00 
06 00 
11 05 
04 04 
06 04 

10 04 



F17 
F18 



M29 They contributed to^understanding 
They encouraged further exploration 

They w«r«^i"f •ar*^«?^*?*^2. ^WVf^ 
Appropriate in length Ic difficulty 

They were related to class work 

iourse'^iJfVSriul tv, <eaiiy-hard) 
lourse workload (light-heavy) 
;ours« pace (slow-fist) 
Jtudents had to worS hard 
Course required a lot of work 
^ ^ Course workload Mas heavy 
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00 01 11 -02 12 01 67 03 04 01 
05 15 07 08 10 -01 49 09 04 00 

-04 03 08 08 10 10 57 04 10 -03 

12 12 -14 09 -04 16 57 03 13 02 

03 -06 19 10 04 20 18 25 19 01 

01 04 09 05 06 04 04 77 01 -02 
10 07 -09 18 -01 13 18 38 15 -07 
01 08 00 07 -02 11 03 78 03 01 
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Table 3 



' LEARNING/VALUE 
Rf Course cRallenging & stimulating 
M2 Learned something valuable 
M3' Increase subject interest 
M4 Learned S< understood subject matter 

INSIRUCTOR ENIHUSIASM 
R5 En€RusTas€rc about teaching 
M6 Dynamic and energetic 
M7 Enhanced presentation with humor 
M8 Teaching style held your interest 

ORGAN I ZQT I ON/ CLQR I lY 
R9 Lecturer explanations clear 
MIO Materials well explained & prepared 
Mil Course objectives stated & pursued 
M12 Lectures facilitated taking notes 

GROUP INTERACIIQN/DISCySSION 
RT3 EncourageH class discussion 
M14 Students shared knowledge/ideas 
M15 Encouraged questions & gave answers 
M16 Encouraged expression of ideas 

INDIVIDUAL RAPPORT 

RT7 FFIindTy €5 individual students 
M18 Welcomed students seeking advice 
M19 Interested in individual students 
M20 Accessible to individual students 

BREADTH OF COVERAGE 

R2T ^Contrasted various implications 
M22 Gave background of ideas/concepts 
M23 Gave different points of view 
M24 Discussed current developments 

GR AD I NG / E X AM I N AJ I ONS 
R25 Exafnlnation feedback valuable 
M26 Evaluation methqds fair/appropriate 
M27 Tested course content as emphasized 

READING/ASSIGNMENTS 
R2S Reaarngs7tex€s were valuable 
M29 They contributed to understanding 

WORKLOBD/D I FF I CUL I Y 
R32 Course BiTTiculty (easy-hard) 
M33 Course workload (1 ight--heavy) 
M34 Course pace (slow-fast) 

OVERALL RATING iTEM§ 
R3T Over air Instructor Rating 
M30 Overall Course Rating 

NOTE: The factor loadings in boxes, the target loadings, are for items 
designed to measure the factor. 
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/ Stuaents' Evaluations 

TABLE 4 

Factor Analysis of Endeavor Items (N=627 sets of ratings) 



• PRESENIATIQN CLABITY 

FT Preiintdtlons clarified materials 
F2 Presented clearly & summarized 
F3' Made good use of examples 

WORKLQAB/DIFFICyLIY 

F4 Stu3en€s Rad to work hard 

F5 Course required a lot of work 

F6i Course workload was heavy 

INDiyigUAU RAPPQBI/PiRSONAL AHENTION 

F7 Iirs€ene3 £ was wming €o ReTp 

F8 Able to get personal attention 

F9 Concerned about student difficulties 

CLASS DiSCySSIDN 
- FT5 Class discussion was welcome 
Fll Students encouraged to participate 
F12 Encouraged students to express ideas 

ORGAN I Z AT I ON/PLANNING 

FT3 Pr esen€a€rons planned in advance 
F14 Provided detailed course schedule 
F15 Activities orderly scheduled 

GRAQING 

FT5 Grading was fair and impartial 

F17 Grading reflected student performance 

F18 ' Grading . indicative of accomplishments 

LEARN I NG/ VALUE / ACCOMPL I SHMENTS 

FT9 OnBersEooa €Re aBvanceB material 

F20 Ability to analyze issues 

F21 Increased knowledge 8e competence 



(1) (2) (3) (4) (5) (6) (7) 



60 
63 
50 



06 
03 
08 



05 
02 
13 



11 86 03 
-03 87 00 
-03 85 -02 



06 01 
03 01 
05 02 



21 00 33 
01 04 07 
05 -01 10 



35 07 20 



-04 



00 



08 00 04 



05 01 14 

02 -01 -02 

03 01 -02 



27 -12 -07 
02 05 09 
02 06 05 



04 
08 
11 



-02 
.02 
01 



64 20 
70 04 
48 30 



-08 
19 
-01 



16 
15 

-03 



01 
04 
00 



06 
06 
08 



06 
1 1 
15 



NOTE: The factor loadings in boxes, the target loadings, 
designed to measure the factor. 
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TABLE 5 



MTMM Matrix of Correlations Among SEEQ^'and Endeavor Factors From Responses 
Spanish Students (N=627 sets or ratings) and Austral i an Students (N=Sl6 se 



__ By 
sets) 
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NOTE: Coefficients in paren" 
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boxes are the convergent validities. 
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